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1 Description of the Related Art 
2 

3 Traditional integrated circuit chips, also known as computer chips, are dedicated 

4 to a single function, with the chips attached to one another at a circuit board level. However, the 

5 number and types of circuits that can be place on a computer chip has continued to advance at a 

6 rapid pace. It is now possible to include circuits for many different functions on a single chip to 

7 create a complete "system on a chip." 
8 

p£ Designing systems on a chip can be daunting. In particular, providing for 

Jp communication between different on-chip integrated components can be difficult. Furthermore, 

□j traditional design approaches tend not to be scalable to systems that involved increasing numbers 

; ft of on-chip components. 

Each function on multi-functional single chip is implemented by an independently 

SI 

1% operating module. To function, each module exchanges data with another module. These 

□6 modules function as a data transfer pair. As the number of functions on a single chip increases, 

La. 

] 7 multiple data transfer pairs are needed to simultaneously transfer data. In a traditional time 

18 domain shared bus, only one data transfer pair can transfer data on the shared bus at any given 

19 time. Thus, in the event that multiple data transfer pairs need to simultaneously transfer data, 

20 only one pair can have access to the bus at a time and the other pairs must wait. In a switch 

21 fabric, each module has a communication path from itself to all other modules; and thus, if the 

22 target module is not currently engaged in a data transfer, it can accept data from an initiator 

23 without contention with other data transfers that may be simultaneously occurring. 
24 



2 



# • 

136.1005.01 

1 Summary of the Invention 

2 

3 Accordingly, what is needed is a system for providing simultaneous 

4 communication among on-chip integrated components. This system should be flexible enough to 

5 accommodate different types of components. The system also should allow for easy integration 

6 of the components. Furthermore, the system should be easily scalable - in terms of both 

7 bandwidth and connectivity - to provide communication between increasing numbers of 

8 integrated components. 

d 

^) The invention addresses the foregoing needs by providing a system that includes 

an on-chip communication switch fabric for use by on-chip components. Preferably, the system 

ru 

^2 uses a zero-wait-state packet-based communication protocol. The primary reason for packet 

U) 

Hs based data transfers is because any target may have multiple initiators desiring to transfer data to 

34 it at any given time. By using a zero-wait-state packet based data transfer, the initiator is forced 

SI 

! ||5 to transfer data every clock cycle which maximizes the data transfer bandwidth to the target. A 

Jz 

Fg5 second reason is that by limiting the packet size, the arbiter must frequently re-arbit and grant the 

M= 

17 bus; this ensures that the bus will operate in accordance with the priority scheme that the arbiter 

18 is designed to implement. Each target and initiator have exactly the same interface signals and 

19 timing, greatly simplifying learning costs for chip developers. The system also preferably uses 

20 multiplexors for signal selection, with the multiplexors being constructed from plural smaller 

21 multiplexors that can be distributed across a chip. This feature allows the system to be spread 

22 out across a chip, facilitating scalability. Furthermore, in a preferred embodiment, the system 

23 can use a different clock domain from the components, allowing for greater flexibility in chip 

24 design. Each component, as well as the system, may be in an independent clock domain. 
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1 Accordingly, one embodiment of the invention is a system for communication on 

2 a chip. The system includes an on-chip communication bus including plural tracks, and a 

3 plurality of stations that couple a plurality of on-chip components to the on-chip communication 

4 bus. The plurality of on-chip components use the tracks to communicate. Preferably, the stations 

5 use a packet based communication protocol. Each component has a dedicated track which it can 

6 use to send information to any/all other components. 
7 

8 Example of on-chip components that can utilize the invention include, but are not 

i=9 limited to, a PCI bridge, a USB interface, and an I2C interface. Other examples include a UART 
interface, a DDR and/or SDRAM, an ethernet interface, a general I/O interface, and other 

y* 

Ul components. 

\ y 

X A 

In a preferred embodiment, each station includes an initiator that requests 

54 permission to transmit outgoing data over a track to another station and that transmits the 
M 

=|5 outgoing data, an arbiter that evaluates requests from other stations and selects a track on which 

S to receive incoming data, and a target that receives the incoming data. The arbiter is constructed 

Lj. 

17 to receive requests of varying priorities and to grant access based upon those priorities. The 

18 initiator can be connected to a grant multiplexor for selecting a grant line, and the arbiter can be 

19 connected to a track multiplexor for selecting a track. In order to facilitate scalability, these 

20 multiplexors can be constructed from plural smaller multiplexors distributed across the chip. 

21 The plurality of tracks and multiplexors preferably implement a crossbar switch. 
22 

23 Each station can also include a source queue for queuing outgoing data and a 

24 destination queue for queuing incoming data. These queues preferably are first-in-first-out 
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1 registers. The source queue and the destination queue can serve to separate a clock domain for 

2 the on-chip communication bus from clock domains for the plurality of on-chip components. 

3 Thus, components that run at different clock speeds can be more easily accommodated than in 

4 traditional systems. 
5 

6 In order to provide for even greater flexibility, more than one of the plurality of 

7 on-chip components can be coupled to the on-chip communication bus through one of the 

8 stations. This arrangement is particularly useful for connecting plural slower components to the 
^ bus, with the benefit that memory and routing resources can be conserved. 

l 

Li 

ly Each station also preferably includes or is connected to a watchdog circuit that 

JH determines if its station has gone offline. If a watchdog station determines that its station has 

TS gone offline, that watchdog station informs a controller connected to the system. The controller 

H4 can then re-route or block communications to that station, thereby helping to prevent the offline 

S! 

W$ station from interfering with normal communications between components across the system. 

5 

17 The invention also includes methods for performing the foregoing operations, as 

18 well as other embodiments of the invention. 
19 

20 This brief summary has been provided so that the nature of the invention may be 

21 understood quickly. A more complete understanding of the invention may be obtained by 

22 reference to the following description of the preferred embodiments thereof in connection with 

23 the attached drawings. 
24 
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Figure 1 shows an overview of an on-chip communication system according to the 

invention. 

Figure 2 illustrates one possible embodiment of an on-chip communication system 
according to the invention. 

Figure 3 illustrates one possible embodiment of a station for an on-chip 
communication system according to the invention. 

Figure 4 illustrates one possible arrangement for interconnecting track lines for 
stations according to the invention. 

Figure 5 illustrates one possible arrangement for interconnecting grant lines for 
stations according to the invention. 

Figure 6 illustrates one possible arrangement for plural on-chip components to 
share a single station according to the invention. 



Figure 7 is a flowchart for explaining communication between components across 
an on-chip communication system according to the invention. 



m 
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Figure 8 illustrates a technique for interconnecting stations using smaller 
multiplexors to improve scalability according to the invention. 

Description of the Preferred Embodiment 



ReltJterf Disclosure- 



Inventions described in this disclo: 
described in the following application: Appllcatioj 



Fabrics, Express Mail No. 
icfuuiu! as IfflJlly yci fuiili hwt'tH 




used in conjunction with inventions 
No. , filed 



in the name of inventor Jac; 



for Ordering in Multi-Path Switching 
application is hereby incorporated by 



Lexicography 

Chip: An integrated circuit chip. Examples include, but are not limited to, a 
central processing unit, digital signal processing chip, memory manager, or complete "system-on- 
a-chip." 

System-on-a-Chip: A chip that contains all circuits necessary for implementing a 
complete system, for example for a basic computer. 

Component: A subset of circuits on a chip that perform a particular function or 
operation. Examples include, but are not limited to, a PCI (peripheral component interconnect) 
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bridge, a USB (universal serial bus) interface, an I2C (inter-integrated-circuit) interface, a UART 
(universal asynchronous receiver transmitter) interface, a DDR (data direction register) and/or 
SDRAM (synchronous dynamic access memory), an ethernet interface, a general I/O 
(input/output) interface, and other circuits and interfaces. Components also can be referred to as 
peripherals. 

Station: A port to an on-chip communication bus according to the invention. 
Clock Domain: A subset of circuits or components that uses a common clock 

signal. 

Packet-Based Protocol: A communication protocol in which data is sent in 
packets, typically along with header information for the data. 

Split-Response Transaction: A two-stage operation that is split over two 
transactions, namely a request operation and a completion operation. In a split-response read 
transaction, a first station sends a read request to a second station. The second station responds 
to the read request command by initiating a read completion operation to write the requested data 
to the first station. 

Head-of-Line Blocking: Blocking that occurs when transmission of data at the 
front of a source queue is delayed because it is intended for a station or component that is busy, 
thereby blocking transmission of data deeper in the source queue that is intended for a station or 
component that is not busy. 
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1 Queue: A register or memory that stores data while the data awaits transmission 

2 or other processing. 
3 

4 FIFO (First In First Out) Register: A register that orders data such that data is sent 

5 from the register in the order that the data was received by the register. 
6 

7 Overview 
8 

9 Figure 1 shows an overview of an on-chip communication system according to the 

MB) invention. 

m 

Rj? Chip 1 in Figure 1 includes plural on-chip components that communicate using an 

H§ on-chip communication system. The components in Figure 1 are PCI bridge 2, USB interface 3, 

m UART interface 4, 12C interface 5, DDR and SDRAM 6, EEPROM 7, Ethernet interface 8, 

[15 general I/O interface 9, and other components 10 and 1 1 . Each of these components is connected 

rk to on-chip communication bus 12 through stations 13 to 22, respectively. Thus, components 2 to 

17 11 can communicate with each other through stations 1 3 to 22 and on-chip communication bus 

18 12. The invention is not limited to the particular number and/or types of components shown in 

19 Figure 1. 
20 

21 According to the invention, on-chip communication bus 12 includes plural tracks. 

22 These plural tracks allow more than one component to communicate with another component 

23 simultaneously. 
24 
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Each track preferably includes lines for data bits and other control information. 



For example, one embodiment of a track includes lines for 64 bits of data, eight command/byte 
enable (C/BE) signals, two parity signals (one per double word of data), a start of packet signal, 
and an end of packet signal. 



protocol. Use of such a protocol simplifies a chip designer's task in developing and/or 
modifying components to communicate through the on-chip communication bus and reduces the 
time that an initiator consumes for a given size data transfer. The underlying principle is that a 
station does not initiate a data transfer until it is ready to communicate quickly. 

Station Design and Interconnection 

Figure 2 illustrates one possible embodiment of an on-chip communication system 
according to the invention. Figure 2 is a high-level diagram that shows the basic functionality 
used by stations according to the invention. 



communication bus including plural tracks, and a plurality of stations that couple a plurality of 
on-chip components to the on-chip communication bus. Each station has a dedicated track which 
it can use to send information to other stations. 



On-chip communication bus 12 preferably uses a packet based communication 



Briefly, a system for communication on a chip includes an on-chip 
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1 In Figure 2, stations 25 to 28 intercommunicate through switch fabric 29, which 

2 includes on-chip communication bus 12. Of course, the invention is not limited to four stations, 

3 and the stations need not be constructed and arranged as shown in Figure 2. 
4 

5 Each of stations 25 to 28 is constructed similarly. Station A 25 includes 

6 transmitter 3 1 , requester 32, receiver 33 and arbiter 34. Station B 26 includes transmitter 36, 

7 requester 37, receiver 38 and arbiter 39. Station C 27 includes transmitter 41, requester 42, 

8 receiver 43 and arbiter 44. Station D 28 includes transmitter 46, requester 47, receiver 48 and 

r§ arbiter 49. While the transmitters, requesters, receivers and arbiters are shown as separate blocks 

H in Figure 2, these functions can be combined in a single circuit or block. 

if? Transmitter 3 1 of station A 25 is responsible for transmitting data to switch fabric 

T3 29. In Figure 2, clocking of data from transmitter 3 1 is enabled by requester 32 through a clock 
enable (CLKEN) signal . 

m 

4* Before requester 32 of station A 25 enables transmission of data, requester 32 

17 sends a request (REQ) signal to each of the other stations connected to switch fabric 29. In a 

18 preferred embodiment of the invention, the request signals are multi-bit signals that incorporate 

19 different levels of priority for requests. For example, in a preferred embodiment, each request 

20 line is three bits wide to allow for seven different request priority levels (plus a no-request level 

21 of 000). When requester 32 receives a grant (GNT) signal from one of the other station in 

22 response to the request signal, requester 32 enables transmission of data from transmitter 3 1 . 
23 
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Station A 33 also can receive data, in particular through receiver 33. Arbiter 34 of 



station A 25 arbitrates and controls what data is sent to station A 25 from the other stations. 
Arbiter 34 performs this arbitration based on the priorities of request signals sent from the other 
stations. Arbiter 34 controls what data is sent to station A 25 by sending various grant signals in 
response to those request signals. This arrangement, in which a station can select what data is 
sent to that station, allows implementation of a split-response transaction model for 
communication over switch fabric 29. 



The components connected each of the stations are not shown in Figure 2. These 
components provide the data sent by the transmitters and receive the data received by the 
receivers. One or more such components can be connected to each station. 



according to the invention. For example, the system can include system registers for storing 
system parameters and a system controller for controlling system operation. These system 
registers and system controller preferably are connected to the on-chip communication system 
through their own station. The system also can include other special stations, watchdog circuits, 
and other elements. 

Figure 3 illustrates a preferred embodiment of a station for an on-chip 
communication system according to the invention. 



Stations B 26 to D 28 operate similarly to station A 25. 



Other elements also can be included in the on-chip communication system 
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1 In Figure 3, station 50 connects component 51 to switch fabric 52, which includes 

2 on-chip communication bus 53 with plural tracks. Thus, in order for component 51 to 

3 communicate with other components across on-chip communication bus 53, component 5 1 

4 transmits data to and receives data from station 50. Station 50 in turn communicates with other 

5 stations through switch fabric 52, and those stations communicate with their respective 

6 components. 
7 

8 Station 50 preferably includes initiator 54, target arbiter 55, and target 56. 

r9 Initiator 54 requests permission to transmit outgoing data over a track to another station and 

y|p transmits the outgoing data. Target arbiter 55 evaluates requests from other stations and selects a 

k|l track on which to receive incoming data. Target 56 receives the incoming data. 

m 

J3 Compared to the stations shown in Figure 2, initiator 54 performs the functions of 

.^4 both a requester and a transmitter shown in Figure 2. Target arbiter 55 performs the functions of 

^5 an arbiter shown in Figure 2. Target 56 performs the functions of a receiver shown in Figure 2. 

17 Returning to Figure 3, initiator 54 is connected to multiplexor 57, which in turn is 

18 connected to on-chip communication bus 53. Likewise, target 56 is connected to multiplexor 58, 

19 which also is connected to on-chip communication bus 53. 
20 

21 The multiplexors for all stations connected to the on-chip communication bus 

22 along with the tracks of the bus form switch fabric 52, which preferably implements a crossbar 

23 switch. The switch fabric also can include other elements, as discussed in more detail with 

24 respect to Figure 8. This switch fabric serves to switch data between stations over the tracks of 

13 
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1 on-chip communication bus 53. The switch fabric also can switch grant signals and other control 

2 data. 

3 

4 In order for the invention to utilize the plural tracks of on-chip communication 

5 bus 53, switch fabric 52 preferably is a multi-path switch fabric. In a preferred embodiment, this 

6 multi-path switch fabric is substantially equivalent to a cross-bar switch, except that the 

7 invention preferably utilizes arbitration based on request signals to determine switching as 

8 opposed to conventional scheduling. 

Jp Initiator 54 in Figure 2 also is connected to source queue 60, and target arbiter 55 

Hjl and target 56 are connected to destination queue 61 . These queues preferably are first-in-first-out 

r?! 

(FIFO) registers. 

u 

o 

rij4 Queues 61 and 62 allow component 51 to operate in a different clock domain (i.e., 

! Ss using a different clock speed and/or clock) from the on-chip communication bus, and thus in a 

if? 

M6 different clock domain from other components. Figure 3 shows on-chip communication bus 

17 clock domain 63 on one side of queues 60 and 61, and component clock domain 64 on the other 

1 8 side of queues 60 and 6 1 . 
19 

20 Different clock domains can be accommodated because data can be clocked into 

21 the queues at a different rate than the data is clocked out. This provides chip designers with 

22 greater flexibility in designing chips and integrating different components into those chips as 

23 compared to systems in which only one or a few clock domains can be accommodated. 
24 
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Of course, the invention does not require that components run in different clock 
domains. Components can run in the same clock domain as the on-chip communication bus 
and/or each other, if so desired. 

Source queue 60 is connected to packetizer 66, and destination queue 61 is 
connected to de-packetizer 67. The packetizer and de-packetizer allow component 51 to 
communicate with station 50 using a simplified packet-based protocol. Use of such a protocol 
simplifies the task of connecting a component to a station according to the invention, thereby 
reducing learning costs for chip designers using the invention. 

A preferred embodiment of the packet protocol uses a 64 bit header and variable- 
sized payloads. Up to 32 payloads preferably can be sent with each header. The preferred 
embodiment of the header includes the following fields: station ID, report bit, long address bit, 
priority field, tag field, payload count, and address. 

The station ID is 5 bits and identifies the source of the packet. It is assigned by 
the chip designer. 

The report bit indicates whether or not a destination station should report to a 
source station with a completed without error message after completion of a data transfer or other 
command without an error. 

The long address bit indicates that the first 24 bits of the first payload after the 
header contains additional address information. 
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1 The priority field holds a 3 bit priority level for the packet. This priority 

2 preferably matches the priority of the request signal sent for the packet. 
3 

4 The tag field is a 5 bit field used to uniquely identify split-response transaction 

5 requests. These types of requests are used in read operation, as discussed in more detail below 

6 with reference to Figure 7. 
7 

8 The payload count contains 9 bits that indicate how many packets of payload are 

r=9 associated with and will follow the header. 

I 

5hl The address field stores a 40 bit address for the data. This address preferably is 

; jj2 with respect to an address space assigned to the station, and thereby to the component(s) 

T3 connected to the station. 

ft 

^5 Other arrangements for a station, component, switch fabric and packet layout are 

M6 possible and also fall within the scope of the invention. 

i . a 

17 

18 Figure 4 illustrates one possible arrangement for interconnecting track lines for 

19 stations according to the invention. 
20 

21 In Figure 4, initiators 69 to 72 are connected to targets 74 to 77 through 

22 multiplexors 79 to 82. The initiators, targets and multiplexors are connected such that data sent 

23 over a track from an initiator at any station can be received by a target at any other station. The 

24 multiplexors in Figure 4 correspond to multiplexor 58 in Figure 3. Thus, when a station's target 

16 
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1 arbiter sends a grant signal to another station, that target arbiter uses the station's track 

2 multiplexor to select the corresponding track for receiving data from the other station. 

3 

4 Figure 5 illustrates one possible arrangement for interconnecting grant lines for 

5 stations according to the invention. 
6 

7 In Figure 5, target arbiters 84 to 87 are connected to initiators 89 to 92 through 

8 multiplexors 84 to 97. The target arbiters, initiators and multiplexors are connected such that 
J? grant signals sent from an arbiter at any station can be received by an initiator at any other 

fi> station. The multiplexors in Figure 5 correspond to multiplexor 57 in Figure 3. Thus, when a 

jfl station's initiator sends a request signal to another station, that initiator uses the station's grant 

i y 

W multiplexor to select the corresponding grant line from the target arbiter for the other station. 

hi 

m The requesting station can then monitor that grant line for a grant signal from the other station. 

5 

D 
SI 

fl£ The request lines preferably are not connected to the stations through 

multiplexors. Instead, the request line(s) from each station's initiator preferably are directly 

17 connected to each other station's target arbiter. Each station's target arbiter preferably is directly 

18 connected to all request lines from all other stations. For example, if there are four stations, each 

19 station's target arbiter preferably is connected to the three sets of request lines from each of the 

20 other stations. This arrangement allows stations to receive and to react extremely quickly to 

2 1 request signals from other stations. 
22 
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Station Sharing 

Figure 6 illustrates one possible arrangement for plural on-chip components to 
share a single station according to the invention. This arrangement is particularly useful when 
several components are relatively slow compared to other components and/or to the on-chip 
communication system. 

In Figure 6, three components 100 to 102 share a station. These components are 
illustrated as USB interface 100, UART interface 101, and I2C interface 102. Of course, the 
invention is not limited to these particular components or to three components sharing a station. 
More or fewer components can share a station according to the invention. 

As shown in Figure 6, an additional arbiter 1 04, decoder 1 05 and multiplexor 1 06 
are used to connect the plural components to a station. Comparing Figures 3 and 6, all of the 
elements in Figure 6 take the place of component 51 in Figure 3, with additional signals provided 
for address and flag information. 

Arbiter 104 in Figure 6 further arbitrates grants and requests among the sharing 
components. Decoder 105 decodes address and flag information so as to route incoming data to 
the appropriate component. Multiplexor 106 likewise selects outgoing data from the appropriate 
one of the components. 
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Other arrangements for sharing a station are possible and also fall within the scope 
of the invention. In any case, sharing of a station by plural components conserves memory and 
routing resources. 

Split-Response Transaction Model 

The invention utilizes a split-response transaction model of communication. A 
write operation from one station to another is simple in this model. A first station requests 
permission to write to a second station. If the second station is available and has room in its 
incoming packet buffer, the second station grants the request. Then, the first station sends a write 
command to the second station, followed by the data. 

A read operation is slightly more complicated because a station preferably needs 
to make data available before it can be returned to the requesting station. In order to perform a 
read operation, a first station again requests permission to send a read request to a second station. 
However, instead of sending data, the first station sends a read request command. This command 
preferably includes address information for the data to be read. 

The second station responds to the read request command by initiating a read 
completion operation to write the requested data to the first station. This read completion 
operation is substantially identical to a write operation from the second station to the first station, 
except that the second station indicates that the operation is a read completion. The second 
station preferably makes this indication through the bus command portion of the track used to 
send the data for the read operation. 
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l 

2 The two-stage read operation is called a "split-response transaction" operation 

3 because the operation is split over two transactions: a read request and a read completion. The 

4 tag field in the header for any packets sent in response to a read request is used to align those 

5 packets with the read request. In other words, the tag field is used to align a read request and the 

6 resulting data across the split-response transaction. 
7 

8 Using the foregoing approach, all operations between stations involve transmitting 

fj) information from one station to another station for consumption. For a write, the information 

TJ) includes a write command and the actual data to be written. For a read request, the information 

iTl includes a read request command and address information. For a read completion, the 

[U 

\ik information includes the data that was requested by the corresponding read request command, 
along with an indicator that the data is for a read completion command. 

3 

OS 

y 

Hi) Transmitting Information 

17 Figure 7 is a flowchart for explaining communication between components across 

18 an on-chip communication system according to the invention. The steps in Figure 7 are 

19 discussed with reference to the elements of the station depicted in Figure 3 in order to improve 

20 understanding of the invention. However, the method illustrated by Figure 7 is not limited to use 

21 with the station shown in Figure 3. 
22 

23 In step S701, a component communicates with its station to request a data transfer 

24 over the on-chip communication bus with another component connected to another station. This 
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1 data transfer could be a write operation or a read operation. The first and second stations 

2 communicate with each other to accomplish the data transfer in steps S702 to S709. 

3 

4 In step S702, the first station's initiator sends a request signal to the second 

5 station in step S702. This request is received by the second station's target arbiter. 
6 

7 As discussed above, the request signal preferably indicates a priority for the data 

8 transfer. Write operations preferably are assigned higher priorities than read operations. Thus, 
when requests are evaluated by the second station's target arbiter, writes can be executed before 
any pending reads. This priority scheme facilitates use of the split-response transaction model 

|J 5 1 for communication between components. Without this priority scheme, a station could choose to 

; il read (i.e., consume) data before an earlier-issued write was completed, possibly causing the 

UJ 

-T3 station to inadvertently read stale or inaccurate data. 

M 

% In step S703, the second station's target arbiter evaluates all outstanding requests 

□6 from other stations, including the request from the first station. The target arbiter preferably 

17 selects the request with the highest priority. 

18 

19 In order to grant the first station's request, the second station's target arbiter sends 

20 a grant signal to the first station in step S704. In step S705, the second station selects a track for 

21 the data. In actual operation, steps S704 and S705 preferably occur simultaneously by sending a 

22 grant signal from the second station's target arbiter to both the first station and to a track 

23 multiplexor in the second station. 
24 
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1 In response to the grant signal, the first station's initiator sends a command and/or 

2 data to the second station in step S706. The command preferably is sent using the commandftyte 

3 enable signal lines of the selected track. Commands include, but are not limited to, write 

4 commands, read request commands, and read completion commands. The data preferably is sent 

5 using the 64 data lines in the selected track. 
6 

7 In step S707, the target at the second station receives the command and/or data. 

8 Then, if the command is a read request, flow proceeds from step S707 through step S708 to step 
pp S709. In step S709, the first and second stations reverse roles, and the station that received the 

read request initiates a read completion command to send the data. 

M» 
Hj 

nJ 

i fg Other Operations 
J3 

J:4 The on-chip communication system according to the invention also preferably can 

^fe execute register read and write operations for reading and writing to system registers. Because 

j=t s 6 these system registers preferably also are connected to the on-chip communication system 

17 through a station, the process of reading and writing to the system registers is similar to that 

18 discussed above. Additionally, the system preferably can execute special I/O commands, system 

19 control commands (e.g., initialize, abort, etc.), and the like. System commands preferably are 

20 directed toward a system controller connected to the system through a station. 
21 

22 The invention also can accommodate special direct memory access operations 

23 among stations. These operations involve a special direct memory access station that is beyond 

22 
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the scope of this disclosure. However, such stations can be connected to the on-chip 
communication system disclosed herein without departing from the foregoing teachings. 
Head-of-Line Blocking 

Head-of-line blocking occurs when transmission of data at the front of a source 
queue is delayed because it is intended for a station or component that is busy, thereby blocking 
transmission of data deeper in the source queue that is intended for a station or component that is 
not busy. This type of blocking can greatly impact communication in a system. 

The invention addresses head-of-line blocking in at least three ways: through use 
of "tracks" that typically have twice as much bandwidth as is required by the source or 
destination of the data, through use of a packet-based communication protocol, and through use 
of a watchdog circuit. 

The on-chip communication system according to the invention can be very fast. 
Thus, any blocking that occurs is not likely to last long. This strength is enhanced by the 
system's ability to use a different clock domain for the communication bus than the components 
connected to the system. As a result, the on-chip communication system can operate at a higher 
clock speed than the components, further reducing the impact of any blocking. It is well known 
that head of line blocking limits throughput to roughly 59% of the peak speed of the interconnect. 
By having a 2:1 overspeed in the interconnect, we allow sources and destinations to achieve their 
full data rate despite head of line blocking. 
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The packet-based protocol used by the invention preferably limits how many 
payloads and the length of each payload that can be sent in response to a grant of a request to 
send data. As a result, no one data transfer operation is likely to tie up a station for too long, 
thereby reducing the length of any blocking that does occur. 

The on-chip communication system according to the invention also can include 
one or more watchdog circuits. Preferably, one watchdog circuit is provided for each station. 
These circuits can monitor the stations of the system to see if any station stalls or goes offline for 
more than a predetermined amount of time (e.g., Vi second). Preferably, the value for this 
amount of time is stored in a system register for the on-chip communication system. 

If a station stalls or goes offline for more than the predetermined amount of time, 
that station's watchdog timer can inform a controller for the communication system. The 
controller can then instruct all stations to purge any pending or queued operations involving the 
offline or stalled station or to reroute those operations. Thus, if blocking occurs because of an 
offline or stalled station, the blocking is terminated after the predetermined amount of time. 

Scalability 

The on-chip communication system according to the invention is scalable to large 
systems. This scalability is possible because relatively few components are required to interface 
each component to the system. Scalability also is facilitated by the ability of a station to interface 
plural components to the system. 
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1 However, a problem does exist in that as the number of stations increases, the size 

2 of the grant and track multiplexors also increases. This increase is not linear. Instead, the size of 

3 the multiplexors increases by increasing amounts for each additional station. The increase is of 

4 order N 2 , where N is the number of stations. At some point, if conventional multiplexor circuitry 

5 is used, the footprint of the multiplexors on the chip can become too large and unwieldy to place 

6 on the chip. 
7 

8 The invention addresses the foregoing issue by constructing the multiplexors from 

smaller multiplexors and other circuits distributed across the chip. The stations are 
interconnected using these smaller multiplexors, thereby alleviating the problem of having to 
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ijl place large multiplexor circuits at one place for each station on the chip. 

ft! 

UJ 

H*3 Figure 8 illustrates a technique for interconnecting stations using smaller 

3ft multiplexors to improve scalability according to the invention. The invention also includes the 

111 use of pipeline storage elements - D flip-flops - between some of the multiplexor stages in order 
£ 

D$ to maintain transmission speed when a track must traverse a large number of multiplexor stages. 

17 The invention also includes adjusting the time of issuance of a grant to a new transmitting station 

18 relative to the end of transmission of a current transmitting station according to the number and 

19 location of pipeline storage elements traversed by the track in the switch fabric in order to 

20 eliminate idle cycles between the end of a transmission and the start of a next, waiting 

21 transmission. 
22 

23 In Figure 8, transmitters/receivers 108 to 111 are interconnected through a switch 

24 fabric including D flip-flops 1 13 to 117 and small multiplexors 1 19 to 126. In this case, the 
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term "small" is in comparison to larger multiplexors that would be needed using conventional 
circuitry. 

The dashed lines in Figure 8 illustrate connections that could be made to 
accommodate more stations. Advantageously, no additional space need be used near the existing 
stations. As a result, scalability is improved. 

Alternative Embodiments 

Although preferred embodiments of the invention are disclosed herein, many 
variations are possible which remain within the content, scope and spirit of the invention, and 
these variations would become clear to those skilled in the art after perusal of this application. 
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